Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compaction stats for filtered files #13136

Closed
wants to merge 2 commits into from

Conversation

jowlyzhang
Copy link
Contributor

As titled. This PR adds some compaction job stats, internal stats and some logging for filtered files.

Example logging:
[default] compacted to: files[0 0 0 0 2 0 0] max score 0.25, estimated pending compaction bytes 0, MB/sec: 0.3 rd, 0.2 wr, level 6, files in(1, 0) filtered(0, 2) out(1 +0 blob) MB in(0.0, 0.0 +0.0 blob) filtered(0.0, 0.0) out(0.0 +0.0 blob), read-write-amplify(2.0) write-amplify(1.0) OK, records in: 1, records dropped: 1 output_compression: Snappy

Test plan:
Added unit tests

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@cbi42 cbi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -30,6 +32,7 @@ void CompactionJobStats::Reset() {
total_blob_bytes_read = 0;
total_output_bytes = 0;
total_output_bytes_blob = 0;
total_skipped_input_bytes = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: total_filtered_input_bytes to be consistent with the other new stats

Copy link
Contributor Author

@jowlyzhang jowlyzhang Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for the suggestion! I thought a bit about this and feel the action of filtering is applied to the file, and it ends up getting all the bytes in the file skipped, as opposed to the bytes are filtered. So I made this distinction in the name. The documentation of the field should make it clear where these fields come from.

@jowlyzhang
Copy link
Contributor Author

@cbi42 Thanks for the review!

@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in ef119c9.

@jowlyzhang jowlyzhang deleted the stats_for_filtered_files branch November 19, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants